import sys
from vaultdemo import demo
from branca.colormap import linear
from datetime import datetime
from ipyleaflet import (Map, GeoData, basemaps, WidgetControl, GeoJSON,
LayersControl, Icon, Marker,basemap_to_tiles, Choropleth,
MarkerCluster, Heatmap,SearchControl,
FullScreenControl)
from ipywidgets import Text, HTML
from shapely.geometry import Point
import geopandas as gpd
import json
import math
import os
import pandas as pd
import fiona
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
%matplotlib inline
import plotly.express as px
We received ??? rows in ??? files comprising ??? bytes as described in the Tech Scenario:
In addition we received one additional file that appears to be a registry of a subset of DoD AIS transponders
def plot_daily(daily_counts_file, years):
num_years = len(years)
df = pd.read_csv(daily_counts_file,
sep='\t',
names=["date", "count"],
parse_dates=[0])
for i in range(num_years):
df_plot = df[(df['date']>datetime(years[i],1,1)) & (df['date']<datetime(years[i]+1,1,1))]
fig = px.bar(df_plot, x="date", y="count", range_x=[datetime(years[i],1,1), datetime(years[i]+1,1,1)])
fig.show()
#
Distinct satellites: 18,817
Other Stats:
plot_daily("../data/VAULT_Data/stats_out/tle_daily_records.tab", list(range(2004,2019,1)))
plot_daily("../data/VAULT_Data/stats_out/ais_daily_records.tab", list(range(2009,2018,1)))
Distinct Ships:
Other stats:
Because satellite orbital data is considered to be excellent quality so long as it is not more than 48 hours out of date, we chose to organize the TLE data on a per-day basis assigning one TLE line pair per day for every satellite. This meant bringing older records forward when data was unavailable for a given day.
With this approach we found we could compute all of the satellite positions for a given day in 6 seconds, unscaled. Because the Spark resources available to this team take longer than 6 seconds to coordinate a task, we decided to leave this un-scaled. This algorithm could be scaled easily over a microservices architecture, but not "two weeks easy" with our current resources.
We did scale the algorithm as a whole to create the following investigative time-series using a Hive transform
YEAR=2017
MMSI=366978710
demo = demo.Demo(YEAR, MMSI)
demo.show_map()
demo.map
demo.hittest({"dt": "2017-01-31T05:16:05", "lat": 54.5261, "lon": -165.6512})
demo.starmap()
A significant number of issues were identified in the given datasets
pd.DataFrame({
"Minumum Hole Size (in Days)": ["0", "<=1","<=2","<=14","<=28","<=56"],
"Count": [425, 625, 809, 1217, 1325, 1522 ]
})
So out of 18,817 satellites, very few had no holes in the provided data.